Sushil K Sharma --- Q3 Split Test Analysis

Problem Statement:

Over the course of a week, I divided invites from about 3000 requests among four new variations of the quote form as well as the baseline form we've been using for the last year. Here are my results:

  • Baseline: 32 quotes out of 595 viewers
  • Variation 1: 30 quotes out of 599 viewers
  • Variation 2: 18 quotes out of 622 viewers
  • Variation 3: 51 quotes out of 606 viewers
  • Variation 4: 38 quotes out of 578 viewers

What's your interpretation of these results? What conclusions would you draw? What questions would you ask me about my goals and methodology? Do you have any thoughts on the experimental design? Please provide statistical justification for your conclusions and explain the choices you made in your analysis. For the sake of your analysis, you can make whatever assumptions are necessary to make the experiment valid, so long as you state them. So, for example, your response might follow the form "I would ask you A, B and C about your goals and methodology. Assuming the answers are X, Y and Z, then here's my analysis of the results... If I were to run it again, I would consider changing...".

1. Questions to Ask

  • What are the assumptions of above experiement design?
  • Are the above samples of baseline vs variations comparable (i.e random)? Are the viewers for all variations taken from the the same sample (e.g. gender, region, location, etc.)? It is possible that a particular location have a higher conversion rate.
  • Did we run the experient for baseline vs variations on the same days i.e. the conversion rate may be diffrenet on different
  • Are we testing only minor variations in the invites vs whole new design/customer experience? A/B testing is not suitable for a whole new design as it would take some time for customers to adjust to the new interface.

2. Assumptions

  1. Assuming variations are independent, and binomially distributed with a true population conversion rate.
  2. 95% confidence interval is accptable for this testing

3. Goals and Methodology:

I am going to follow the following process for fining a solution to the above Q3:

3(a) Step1: Explore Dataset

3(b) Step2: Perform Statistical Tests

3(c) Step3: Results

3(d) Step4: Final Thoughts

3(a) Step1: Explore Dataset


In [36]:
# read data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [37]:
# I have stored the data in a csv file. Let's load the data in pandas dataframe
split_test_df = pd.read_csv("split_test.csv")
split_test_df['conversion_rate'] =  split_test_df['Quotes'] /split_test_df['Views']

In [38]:
# Let's look at the dataframe
split_test_df


Out[38]:
Bucket Quotes Views conversion_rate
0 Baseline 32 595 0.053782
1 Variation 1 30 599 0.050083
2 Variation 2 18 622 0.028939
3 Variation 3 51 606 0.084158
4 Variation 4 38 578 0.065744

In [39]:
#split_test_df.sort_values('conversion_rate').plot.bar(x = 'Bucket', y='conversion_rate)

split_test_df.plot(kind='bar', x = 'Bucket', y='conversion_rate')


Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x18b09e7668>

In [40]:
# percent difference from baseline
split_test_df['percent_diff_from_base'] = (split_test_df['conversion_rate'] - split_test_df['conversion_rate'][0])*100 / split_test_df['conversion_rate']

In [41]:
split_test_df


Out[41]:
Bucket Quotes Views conversion_rate percent_diff_from_base
0 Baseline 32 595 0.053782 0.000000
1 Variation 1 30 599 0.050083 -7.383754
2 Variation 2 18 622 0.028939 -85.845005
3 Variation 3 51 606 0.084158 36.094909
4 Variation 4 38 578 0.065744 18.195489

In [42]:
# Let's plot percent difference from base in a barchart
split_test_df.plot(kind='bar', x = 'Bucket', y='percent_diff_from_base')


Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x18b09f5d68>

Observations

  • Simply looking at the data and above barcharts, 3 & 4 have higher conversion rates comprated to the baseline.
  • Variation 1 has about 7% and variation 2 has about 86% drop in conversion from baseline
  • Variation 3 has about 36% and variation 4 has about 18% higher conversion from baseline
  • Variation 3 seems to be winner. However, we need to test if it is stattistically significant.

3(b) Step2: Perform Statistical Tests

To test the significance of the conversion baseline vs variation 3, I am going to use the z-test. The null hypothesis is that the differences are not statistically significant (i.e. the differences in quotes has occurred by chance). The alternate hypothesis is that the differences did not occur by random chance. As sample size is > 20 we can use central limit theoram. Using Central Limit Theoram, the distribution of the test statistic can be approximated. For this test I am assuming alfha / significance level 5% (or 0.05) would be sifficient. Z-score or critical value for alpha=0.05 is 1.645.

Null Hypothesis, H0: p1 = p2, i.e. There is no significant difference in proportions

Where: p1 is the proportion from the first population and p2 the proportion from the second.

Alternate Hypothesis, HA: p1 not = p2

Equation for calculating Z static:

Python Code to calculate Z-static


In [60]:
# Proportion of variation 3 (i.e. comversion rate)
p1_hat = split_test_df['conversion_rate'][3] 

# Sample size of variation 3 (i.e. views)
n1 =  split_test_df['Views'][3]


# Proportion of baseline (i.e. comversion rate)
p2_hat = split_test_df['conversion_rate'][0]  

# Sample size of baseline (i.e. views)
n2 =  split_test_df['Views'][0]   


print (p1_hat, p2_hat, n1, n2)


0.08415841584158416 0.05378151260504202 606 595

In [61]:
# overall sample proportion
z =  (p1_hat - p2_hat) / np.sqrt((p1_hat * (1-p1_hat) / n2)   + (p2_hat * (1-p2_hat) / n2))

In [62]:
print(f"Z score is: {z}")


Z score is: 2.0713650987402503

3(c) Step3: Results

From above results 2.099 > 1.645. Hence we can reject the null hypothesis i.e. We can say there is a significant difference in proportions of baseline vs variation 3 at 95% confidence level.

3(d) Step 4: Final Thoughts

If I were to run it again, I would consider to add more samples (i.e. viewers) of each variation in the experiment design. This would result in better understanding of distribution of each variation.

4. References: